Page 1

Shannon information and integrated information: message and meaning

Alireza Zaeemzadeh and Giulio Tononi

Department of Psychiatry, University of Wisconsin, Madison, WI 53719, USA 6001 Research Park Blvd, Madison, WI 53719 USA Correspondence: gtononi@wisc.edu

Acknowledgements. This work was supported by the Templeton World Charity Foundation (TWCF0216). We thank especially Matteo Grasso and Jeremiah Hendren for help with figures and text, and present and past members of the laboratory for helpful discussions.

Abstract—Information theory, introduced by Shannon, has been extremely successful and influential as a mathematical theory of communication. Shannon’s notion of information does not consider the meaning of the messages being communicated but only their probability. Even so, computational approaches regularly appeal to “information processing” to study how meaning is encoded and decoded in natural and artificial systems. Here, we contrast Shannon information theory with integrated information theory (IIT), which was developed to account for the presence and properties of consciousness. IIT considers meaning as integrated information and characterizes it as a structure, rather than as a message or code. In principle, IIT’s axioms and postulates allow one to “unfold” a cause–effect structure from a substrate in a state—a structure that fully defines the intrinsic meaning of an experience and its contents. It follows that, for the communication of meaning, the cause–effect structures of sender and receiver must be similar.

Page 2

Introduction Consider a simple scenario in which Alice sees the house on fire and communicates what she sees and thinks to Bob (Fig. 1). We can assume that, by seeing the house on fire and worrying about her cat, Alice’s brain enters a specific state associated with her experience (some neurons firing and some not), after which she texts Bob. When Bob receives the message, it triggers a specific state in his brain, associated with the experience of imagining his house on fire. Clearly, Alice was able to communicate some information about what she saw to Bob, and Bob understood the overall meaning of that information. But how does information convey meaning, and where is the meaning to be found? The term information can be a source of confusion[1] and its relationship to meaning, another loaded term[2, 3], can be problematic. Here, we focus on two fundamental uses of the term: information as messages encoding symbols for transmission, processing, and storage, studied by Shannon information theory[4], and information as meaning, understood as a structure corresponding to a conscious content, addressed by integrated information theory (IIT)[5]. Both Shannon information theory and IIT start from axioms and employ the tools of probability theory, but their formalism and definition of information are different and complementary.
Information theory quantifies the information content of the message as reduction of uncertainty from the extrinsic perspective of a communication engineer. It provides the formalism for encoding, transmitting, processing, and decoding the message optimally from source to target over a (noisy) channel. As recognized by Shannon himself, it has nothing to say about the meaning of the messages being communicated. IIT, in contrast, quantifies the information content of the experience triggered by the message from the intrinsic perspective of a conscious subject. It provides a formalism to characterize the intrinsic meaning of the experience as a cause–effect structure unfolded from the substrate of the subject’s consciousness. In short, Shannon information theory deals with messages Fig. 1. Communicating meaning. Alice is trying to communicate meaning (conscious contents, such as seeing that the house is on fire) to Bob through a message constituted of symbols. Shannon information theory takes the extrinsic perspective and studies the optimal assignment of symbols and the optimal design of encoding/decoding algorithms. IIT takes the intrinsic perspective of Alice and Bob and studies the intrinsic meaning of system states as conscious contents as well as how meaning can be communicated.

Page 3

and their extrinsic transmission as codes, whereas integrated information deals with meanings and their intrinsic existence as structures. This distinction is important when it comes to the brain, especially in view of the pervasive reference to “information processing” and “codes” in neuroscience and psychology. In what follows, we briefly introduce some relevant notions from both information theory and IIT, explained in more detail in the Supplementary material. Next, we consider side-by-side Shannon information and integrated information and highlight some of their differences. We then return to Alice and Bob and ask how the meaning of what she sees (and what he imagines) is instantiated in their brain’s activity patterns. Finally, we consider how meaning can be communicated and the respective roles of Shannon information and integrated information. Shannon information theory The mathematical theory of communication introduced by Claude Shannon, also known as information theory, is concerned with reliably communicating symbols over noisy channels through optimal coding and computational overhead (Fig. 2)[4]. Shannon started with a set of desired properties for an information measure, which he called entropy. In a standard formulation, the “axioms” of Shannon information are continuity with respect to the probabilities, additivity, monotonic increase with number of outcomes, maximality for equiprobable outcomes, and symmetry under the permutation of symbols. Information can then be quantified as the number of binary digits (bits) generated by a source (the source entropy) and/or successfully transmitted over a channel (the mutual information). A channel’s capacity is the maximum rate at which information can be transmitted through it (the maximum of mutual information between source and target). Measures of information satisfying these properties are employed in virtually every area of modern science and society, and are essential to communication engineering, signal processing, machine learning, and neuroscience[6, 7].

Messages (sequences of symbols) transmitted across a channel are encoded and decoded: a code is simply a mapping from one set of symbols to another. Shannon proved that it is always possible to encode and decode messages such that transmission is lossless and as efficient as possible. In general, some information will be transmitted as long as there is some statistical dependence between source and target, implying that some question about the source can be answered based on a readout of the target. Importantly, one can design channels such that they extract some “relevant variable” that conveys Fig. 2. Information theory and the communication of symbols. In information theory, the information content of a message is quantified as the minimum number of uniquely decodable symbols (binary digits) assigned to it. In a communication system, unwanted redundancies are removed (source coding) and errors are corrected by adding communication and computation overhead (channel coding). The source-target pair can achieve optimal communication by utilizing the optimal encoding/decoding algorithms designed by an external observer.

Page 4

information about specific questions and ignores others. This can be considered as a form of information processing[8]. However, information processing can only ever decrease the information transmitted over a channel[8]. For Shannon, the “information content” of a symbol or code is purely a function of its probability of occurrence: regardless of how the symbol is interpreted, the less frequent it is, the higher its information content. As he readily recognized[4], information theory is not concerned with the meaning of a symbol: that must be provided by the source and the target or by an external observer who interprets the message or its origin and consequences. Supplementary material I offers a brief summary of some relevant aspects of information theory. For an in-depth treatment, see, for example, [8].
Integrated information theory Integrated information theory (IIT) is concerned with determining the quality and quantity of consciousness from the causal properties of a substrate[5] (Fig. 3). To be conscious is to have an experience—seeing a face, hearing a voice, having a thought, witnessing a house on fire, imagining it, or dreaming it. In other words, you are conscious if “there is something it is like to be you”[9]. IIT aims to account for the properties of experience—which is subjective—in objective, physical terms.[10]

Fig. 3. IIT and the structure of meaning. IIT defines information as meaning, understood as the content of experience (the feeling is the meaning). It takes the intrinsic perspective of the system and characterizes the meaning of an activity pattern as the Φ-structure unfolded from a complex (a maximally irreducible substrate). The main complex is outlined in blue on the surface of the cerebral cortex (assuming the maximum of irreducibility is over posterior areas), at the grain of individual neurons. The Φ-structure unfolded from it is depicted by a set of vertices (distinctions) bound by edges (relations). Other regions of the brain, such as the cerebellum and parts of the cerebral cortex, can only support very small complexes and associated Φ-structures because they are organized in a modular manner.

Page 5

The starting point of IIT is the existence of experience—its “zeroth” axiom. IIT then characterizes the essential properties of experience—those that are immediately and irrefutably true of every conceivable experience—as the five axioms of phenomenal existence[3]:
(1) Intrinsicality. Every experience (say, Alice’s experience of seeing the house on fire), is subjective—for the experiencer, from her intrinsic or private perspective. It cannot be “for no one.”
(2) Information. It is specific—this one, the house on fire. It cannot be “generic.”
(3) Integration. It is unitary—a whole (the house on fire), irreducible to separate experiences. It cannot be “multiple” (one of a house and another of a fire). (4) Exclusion. It is definite—this whole, containing all it contains, neither less nor more. It cannot be “indefinite.”
(5) Composition. It is structured—composed of distinctions and the relations that bind them together and make it feel the way it feels (the house and its contours, the fire and its flames, all bound to a spatial location, to various colors, and so on). It cannot feel “in no way.” The way the experience feels is also what it means, which is therefore fully intrinsic. In short, the feeling is the meaning[11].
The next step of IIT is to formulate the essential phenomenal properties of experience (the axioms) in terms of corresponding physical properties of its substrate (the postulates). Physical existence is defined as cause–effect power of a substrate of units, understood in operational terms of manipulation and observation: “if we do this” (manipulation—set an initial state), “we see that” (observation—read out the ensuing state). A substrate can thus be fully characterized by its transition probability matrix. On this basis, IIT provides a mathematical formalism to identify a substrate of consciousness and the quality and quantity of experience it supports. The substrate must have cause–effect power whose properties mirror the axioms above: it must be intrinsic (for itself), specific (in its current, specific state, with a specific cause and effect state), unitary (irreducible to its parts), and definite (the set of units that is maximally irreducible). Such a substrate is called a complex and its irreducibility is measured by system integrated information (φs). Finally, the cause– effect power of the complex must be structured, by causes and effects specified by subsets of its units (distinctions) and by their overlaps (relations). The cause–effect structure (or Φ-structure) unfolded from a complex in its current state fully characterizes, with no additional ingredients, how an experience feels—its quality—which defines its intrinsic meaning. The quantity of feeling/meaning is measured by Φ—the sum of integrated information of the distinctions and relations composing the Φ-structure. A content of experience is then a sub-structure within a Φ-structure.
IIT has strong explanatory and predictive power. It can account for why certain brain regions support consciousness and others do not[12], why consciousness fades during dreamless sleep[12], and why different kinds of experiences feel the way they do, such as extended in space and flowing in time[13-15]. It leads to counterintuitive predictions that are currently being tested[16], to practical applications[12, 17, 18], and to inferences concerning the presence or absence of consciousness in natural and artificial systems, such as computers[19, 20]. Supplementary material II offers a brief summary of IIT’s concepts, which are presented in detail in “IIT 4.0”[5] and in an online wiki[21].

Page 6

Shannon information and integrated information side-by-side The distinctive approach and goals of information theory and IIT are reflected in several differences between Shannon information and integrated information. It is useful to illustrate these differences with simple examples that demonstrate the requirements imposed by IIT’s postulates (Table 1 and Fig. 4). (“0th”) Correlational vs. causal. Shannon information requires a correlation between source and target; integrated information requires cause–effect power. In Fig. 4(a), units A and B convey information in the Shannon sense (about one another, being correlated, and about unit X) but do not have any effects of their own. However, by the 0th postulate of IIT, a candidate system must have cause–effect power—the ability to “take and make a difference,” as assessed through interventional probabilities. (1st) Extrinsic vs. intrinsic. Shannon information can be between any source and any target; integrated information requires intrinsic cause–effect power. Fig. 4(b) shows how Shannon information can be defined over any set of source and target units. However, IIT’s intrinsicality postulate requires that the cause–effect power of a candidate system be over itself. (2nd) Generic vs. specific. Shannon information is defined over an ensemble of symbols as the entropy function. Similarly, the mutual information between subsets of units measures the statistical dependence between random variables, which is state-independent (Fig. 4(c)). By IIT’s information postulate, however, the cause–effect power of the system must be specific: the substrate must be in a specific state, with these units ‘on’ and those ‘off,’ and select a specific cause and effect state over itself. (3rd) Segregated vs. integrated. Shannon information benefits from segregation; integrated information requires the substrate’s irreducibility. Fig. 4(d) shows how, due to the additivity axiom of information theory, mutual information adds up when a segregated source–target pair is added. On the other hand, by IIT’s integration postulate, a candidate system must be causally irreducible (φs>0). Therefore, causally segregated units cannot belong together and do not increase the amount of information specified by a substrate over itself. Indeed, the integrated information of a set of causally independent units is zero. (4th) Additive vs. definite. Shannon information benefits from a larger channel; integrated information requires a maximally irreducible substrate with a definite border and grain. As shown in Fig. 4(e), in information theory, the channel, source, and target units (and grain) over which symbols should be transmitted and read out can be chosen based on extrinsic convenience. Also, the additivity of Shannon information implies that larger (and finer) sources and channels are always potentially more informative. By contrast, in IIT, the exclusion postulate requires that a complex must have a definite border (and grain), which is defined by maximal irreducibility and is thus non-arbitrary. This border excludes any larger or smaller entity constituted of overlapping units. In fact, larger systems may have lower φs values than some of their subsets. (5th) Holistic vs. structured. Finally, Shannon information is not structured; integrated information is. Fig. 4(f) shows that, in information theory, subsets of the system do not transmit information beyond what is transmitted by the whole. Statistical dependencies among units are considered as redundancy and do not contribute to the information content of the source. In fact, such redundancies are exploited in source coding to obtain a more compressed representation of the symbols. By contrast, IIT’s composition postulate implies that the subsets of a complex contribute information content by structuring its cause–effect power as distinctions (cause–effects of subsets) and relations (overlaps among causes and/or effects), which together compose the Φ-structure of the complex.

Page 7

Ultimately, because Shannon information is not structured intrinsically, the meaning of symbols (codes) transmitted over a channel must be provided extrinsically, as explicitly recognized by Shannon himself.1 In IIT, instead, the meaning of an activity pattern is given by the Φ-structure unfolded from the complex in that state, that is, intrinsically.
Table I. Shannon information vs. integrated information following IIT’s postulates.

1 Likewise, similar notions of information or generalizations, such as Kolmogorov mutual information and partial information decomposition, are not aimed at unfolding the intrinsic structure of a system’s state. Kolmogorov complexity is the algorithmic complexity of a symbol, defined as the length of the shortest program (in bits) that generates that symbol and halts.[22] Furthermore, conditional Kolmogorov complexity can be defined as the length of the shortest program that takes a symbol as input and generates another symbol and halts. Kolmogorov mutual information is the difference between Kolmogorov complexity and conditional Kolmogorov complexity. Similar to Shannon’s measures, such a notion of information can only assign meaning to symbols from the perspective of an external observer and through an extrinsic variable, that is, the algorithmic implementation. Partial information decomposition[23] generalizes the tools of information theory by defining partial information variables. The mutual information that a set of source variables provides about a given target variable is decomposed into unique information, redundancy, and synergy. For the special case of two source variables, the unique information corresponds to the information that one variable provides and the other does not, the redundant information is the information both of the source variables provide, and the synergistic information is the information that the combination of both variables provides, which is not available from each of the variables alone. The notion of partial informationvariables is consistent with Shannon information (redundancy can coincide with mutual information in special cases) and is an extrinsic quantity defined and measured by an observer outside the system. More importantly, by itself, information decomposition yields a non-negative decomposition into partial information atoms and an ordering among them[23]; it does not yield a structure—that is, a set of objects and the relations among any subset of them. IIT postulate Information theory Integrated information theory Existence (0th) correlational vs. causal Evaluates statistical dependencies through observations. Evaluates causal interactions through interventions. Intrinsicality extrinsic vs. intrinsic Evaluates statistical dependencies between a source and a target. Evaluates cause–effect power within a candidate system. Information generic vs. specific Evaluates statistical dependencies between distributions of symbols,
i.e., random variables. Evaluates the cause–effect power of a specific system state over a specific system state. Integration segregated vs. integrated Information can be provided by independent units. Information must be irreducible to that provided by independent units. Exclusion additive vs. definite Adding more units never decreases the information. Integrated information is maximal for a definite set of units. Composition holistic vs. structured Information is not structured, corresponding to the optimal number of binary digits generated or transmitted. Information is structured, composed of causal distinctions bound by causal relations. Meaning Must be provided extrinsically by an observer who can interpret a code. Is defined intrinsically by the Φ-structure unfolded from a complex in a state.

Page 8

Fig. 4. Differences between Shannon information and integrated information. (a) Shannon information captures correlations (e.g., between units A and B owing to their common input from unit X). In contrast, integrated information captures cause-effect power—the ability of units to “take and make a difference,” as assessed through interventional probabilities (e.g., using the “do- operator” between units A and B). (b) Shannon information concerns the extrinsic correlations between a source and a target, while integrated information is intrinsic in that it captures causal power of a system over itself. (c) Shannon information is generic (state independent, gray units), while integrated information is always specific (state dependent [black = ON, white = OFF]). (d) Shannon information is additive for causally segregated units, while integrated information is not (as measured across a minimal partition, orange dotted line). (e) Shannon information is non-decreasing with the number of units or channels, while integrated information is maximal over a definite set of units (often not the largest set). (f) Shannon information is holistic (subsets of the system do not specify information beyond the whole), while integrated information is structured (subsets of the system specify causal distinctions and relations not captured by the whole).

Page 9

Information processing, coding, and meaning To appreciate the difference between Shannon information and integrated information with respect to meaning, let us now return to Alice and Bob. Neuroscience and psychology have often borrowed the language of information theory and treated the brain as a biological “information-processing” device that performs sophisticated computations in the service of various functions[7].
Neuroscientists frequently talk of “cracking the neural code”[24]. To exemplify, consider the schematic depiction in Fig. 5 (left panel). Seeing the house on fire triggers a complex set of neuronal interactions in Alice’s brain. An activity pattern over Alice’s retina is transmitted through thalamic nuclei and triggers an activity pattern in primary visual cortex (V1), which could be considered as a symbol or code—encoding, for example, the location of a stimulus in space. From there, divergent feed-forward channels convey information to a set of higher-level cortical areas (say, V3 and V4). Each area “processes” the information through several “computations,” extracting a different variable that may correspond to a relevant category (see Supplementary material I). For instance, activity patterns in one area might encode the variable “house” vs. “no house,” a code that can be passed on to other areas. Information converging from V3 and V4 would then be “integrated” by a downstream area that computes a code for “danger” vs. “no danger.” In line with the information-processing metaphor, one could also envision the brain as performing the equivalent of error correction. For example, top-down codes from high- level areas carrying “contextual” information could counteract noise and disambiguate the message conveyed by the stimulus.2 Moreover, stimulus information could be complemented by endogenous sources of information, say, encoding the thought “worried about the cat” and the intention “should text Bob.” In some views, a competition among local codes would lead to a winning code, which would be broadcast globally throughout the brain and would correspond to information that is consciously accessible[28, 29]. Pre- motor and motor areas could then decode the winning code and the intention, leading to a motor output. Ideally, one would be able to explain what the brain does in terms a set of computations carrying out information processing on assorted neural codes to perform multiple adaptive functions, such as stimulus location, coordinate transformation, categorization, representation, attentional selection, memorization, and valuation, as well as goal selection, decision making, and so on (Fig. 5, left panel). The information processing approach to the brain and its functions has been successful in explaining many aspects of how the brain does what it does. However, its shortcomings should not be overlooked. One is that codes, computations, and functions do not map tidily onto brain circuits the way they might onto a computer. As a biological system shaped by evolution, development, and learning, the brain is marked by pleiotropy (multiple effects produced by the same cause) and degeneracy (multiple causes producing the same effect)[30]. It is thus unlikely to comply with neat functional subdivisions. The brain is also unique in its extraordinary connectivity, coupled with spontaneous activity and plasticity. It is not surprising, then, that information about sensory inputs, motor plans, task settings, expectations, and reward can be decoded in a highly distributed manner, from the cerebellum to the frontal pole[31]. Moreover, being able to decode variables or contents that are meaningful to us does not mean that those patterns are meaningful from the

2 Alternatively, the brain might work through “predictive processing,” sending messages top-down to sensory areas[25-27]. These “inferences” about stimuli would then be updated based on error signals provided by stimulus information.

Page 10

intrinsic perspective of the brain[32-34].3 Then there is the issue of which substrate one should “decode.” The brain does not qualify as a channel with well-defined sources and receivers. Should one decode activity patterns from the brain as a whole, the cerebral cortex, the cerebellum, or smaller sub- regions? Based on what criteria? Should one care whether the readout is from a substrate that is highly integrated, and why, given that information benefits from independence? And at which grain should one decode? Over individual neurons and spikes, distributed populations and mean firing rates, or individual synapses and calcium release?4 Finally, by what criteria would all these different codes be related to one another?

3 After all, we could decode the time of day from the length or orientation of a tree’s shadow, but that does not mean that it encodes time for the tree. 4 Note that an observer could in principle map any program onto almost any thermodynamically open substrate through an appropriate encoding of inputs and outputs[35, 36]. Fig. 5. Information processing and information structures in the brain. Left panel: the brain as an information-processing device. Information in the Shannon sense is extracted by different circuits, processed, transferred, integrated, and broadcast across many areas. It can be decoded both locally and globally at many different grains. The interpretation of the code is provided extrinsically, by an outside observer, based on predefined categories or on assumptions about the computations/functions performed by various neural circuits. Right panel: the brain as a substrate for information structures. IIT identifies the main complex and its grain (say, the region inside the blue border, at the grain of neurons, over one hundred milliseconds). In its current state, the main complex supports a Φ-structure (gray shape) composed of integrated information: subsets of neurons contribute different contents of experience as Φ-folds (highlighted in different colors) related among themselves. The Φ-structure fully defines the intrinsic feeling/meaning of the experience. Groups of neurons outside the main complex interact with it and among themselves (hence Shannon information can be decoded from them) but constitute very small complexes supporting trivial Φ-structures.

Page 11

Information processing vs. information structures But the most important shortcoming of information processing approaches has to do with meaning. Even if one were able to sensibly decode activity patterns from different circuits based on plausible computations and functions one might project onto those circuits, where would meaning come from? Ultimately, information processing is a mapping between input and output patterns. As emphasized by Shannon, if an activity pattern is treated as a symbol, it will have no meaning on its own. Its meaning must be attributed extrinsically, relative to observers that provide their own meaning to what the brain (or a computer) might do. Yet when Alice sees the house on fire and worries about her cat, the feeling/meaning of her experience is intrinsic (for her), and absolute, rather than relative to an external observer who interprets what she does. And the meaning is present for her “here and now,” regardless of what the environment might be like and of what her brain might have done or do next. Here is where IIT’s characterization of feeling/meaning as integrated information can explain how a specific activity pattern can have intrinsic meaning. This is because IIT considers information not as a mapping between input and output patterns, but as a structure—a Φ-structure supported by a complex in a state.
As we have seen, IIT’s axioms capture the essential properties of experience, which can be formulated operationally as properties of cause–effect power—IIT’s postulates. The first four postulates require that, because experience is intrinsic, specific, unitary, and definite, the substrate of experience should be a maximum of intrinsic, specific, irreducible cause-effect power. In principle, this allows one to determine, from a system’s transition probability matrix and its current state, the border and grain of the main complex (the largest complex over a substrate). For example, as illustrated schematically in Fig. 5 (right panel), this might correspond to a definite set of units (localized primarily over posterior cortical regions), of a definite grain (say, neurons), which are either active or not over a definite interval (say, thirty milliseconds). Furthermore, because experience is structured by phenomenal distinctions and relations that make it feel the way it feels, the composition postulate requires that the cause-effect power of the main complex be unfolded into causal distinctions and relations that compose its Φ-structure. For example, within the main complex, subsets of neurons contribute different contents of experience as sub-structures of the Φ-structure (Φ-folds), such as “house,” “fire,” and “danger” (highlighted in different colors), which are further related among themselves. According to the theory, the Φ-structure defines in full, with no additional ingredients, the feeling/meaning of the experience.5 The feeling/meaning is defined intrinsically, by the complex and for the complex, and in absolute terms, “here and now,” solely through the distinctions and relations specified by its subsets,6 with no reference to how an outside observer might interpret it, to the performance of some function or computation, to its similarity to other experiences, to what might happen next, or to the

5 The composition of a Φ-structure, for a specific activity pattern, is determined by the intrinsic connectivity of the main complex. To a large extent, the latter is organized the way it is because it has adapted to a changing environment during a long evolutionary, developmental, and learning history. In this sense, the environment is partially responsible for the meaning of an experience, but only indirectly so—historically, rather than here and now. 6 Along the same lines, the “integration of information” is not the computation of an output by a brain region merging multiple inputs, but a large Φ-fold—the set of the distinctions and relations jointly specified by multiple regions.

Page 12

environment.7 In short, information is a structure, not a symbol or code, a process, a computation, or a function.8 While an activity pattern on Alice’s retina eventually triggers an activity pattern in her main complex, which specifies a Φ-structure whose feeling/meaning is intrinsic and absolute, activity patterns elsewhere in Alice’s brain would not be associated with any feeling/meaning, or only minimally so. In Fig. 5 (right panel), this is illustrated by a multitude of very small complexes scattered over the rest of the brain, supporting Φ- structures of negligible Φ but interacting among themselves in complicated ways. For example, circuits in Alice’s hypothalamus and brainstem constantly monitor and regulate blood pressure—they are clearly processing blood pressure information. Yet blood pressure regulation goes on completely “in the dark”—it does not have any feeling/meaning for her. Or consider the cerebellum. Like the cortex, the cerebellum is connected to various sensory and motor pathways (and, indirectly, to the cortex itself). It “represents” many aspects of the body and the environment and is involved in many complex processes. In fact, from cerebellar activity patterns, it is possible to decode not only sensory and motor variables, but also cognitive and emotional ones[42]. Yet the cerebellum, despite having four times more neurons than the cerebral cortex, can be completely removed without appreciably changing the feeling/meaning of what the person experiences[43].9 So, an activity pattern in certain parts of Alice’s cerebral cortex signifies “house on fire, worried about the cat” intrinsically, but concurrent activity patterns in her hypothalamus, brainstem, and cerebellum signify nothing.10
Equating meaning with feeling also implies that, without the feeling, there would be no meaning at all. Without subjects who can consciously observe and interpret the world and conceive of functions and computations, there would be no meaning whatsoever. Just imagine a world devoid of conscious beings but populated by artificial intelligence (AI), implemented on digital computers, that can perform any task as well or better than we do. According to IIT, digital computers have the wrong architecture to be able to support complexes of high Φ and therefore consciousness[19]. In such a world everything would happen “in the dark:” nobody would see anything, hear anything, or think anything.11 A

7 This view of meaning also differs from extrinsic views that assign meaning based on the similarities among activity patterns or their location within “semantic spaces”[37-41].
8 If one were to insist on the coding metaphor, we could say that IIT provides the ultimate recipe for decoding intrinsic meaning: it specifies over which substrate to decode (the main complex), at what grain (the one maximizing the φ value of the complex), and how to unfold (decode) its Φ-structure. It also provides a way to characterize the contribution to intrinsic meaning of an individual unit (say, an activated neuron in a “face area”) through the Φ-fold it specifies (the set of distinctions it contributes, alone and in combination with other units, and the associated relations). 9 This is consistent with IIT’s notion that the cerebellum, owing to its strictly modular, feed-forward architecture, cannot support a complex of high Φ, unlike the cerebral cortex, much of which is organized like a dense lattice[12]. 10 In fact, even activity patterns in certain parts of cortex may also signify nothing if the underlying organization of connections is highly modular, as seems to be the case for much of prefrontal cortex. Similarly, patterns of activity in posterior cortex may signify nothing during dreamless deep sleep, even though neurons are typically still active. This is because breakdown of causal interactions due to neuronal bistability leads to the disintegration of the cortical complex[12]. Finally, some neuronal activity in posterior cortex may signify nothing even during wakefulness, if it specifies cause-effect states that are incongruent (different) with respect to the cause-effect state specified by the main complex as a whole. This may occur, for example, during binocular rivalry, when many neurons “encode” information about an unseen stimulus that is incongruent with the seen stimulus. 11 Of course, if we presuppose that consciousness is nothing but some computation or function, then computers mimicking us would be conscious by definition. If IIT is right, however, computers can be

Page 13

description of a visual scene, such as a “house on fire,” encoded in a high-level language, would be decodable from some portion of the computer memory inside an Alice-like machine. But this high-level code and associated computations would have no intrinsic meaning for the device, because it would have no feeling.12
Communicating integrated information as meaning IIT’s account of the intrinsic meaning of an activity pattern as a Φ-structure composed of integrated information has direct implications for its communication. In short, meaning can only be communicated if a Φ-fold within a Φ-structure specified by a source complex triggers a similar Φ-fold within a Φ-structure specified by a target complex. The requirements for the communication of integrated information as meaning are thus much more stringent than those for the communication of symbols across a channel. In Fig. 6(a), two computer programs endowed with artificial intelligence send symbols back and forth, performing computations on the symbols that could be interpreted as functionally equivalent to a conversation between two humans. However, per IIT, standard computers have an internal architecture that is unsuitable for supporting Φ-structures of high Φ; hence they would experience nothing[5, 19], and their computations would have no intrinsic meaning.13 In this scenario, the communication of Shannon information is excellent, but there is no intrinsic meaning to communicate.
The opposite scenario is depicted in Fig. 6(b). Here, both the source complex and the target complex support Φ-structures of high Φ that are intrinsically meaningful. However, owing to the channel being blocked, symbols sent by the source complex cannot trigger intended Φ-folds at the target complex. Because Shannon information across the channel is zero, no meaning can be communicated. Fig. 6(c) illustrates a more interesting scenario. In this case, the source and target complexes support Φ-structures of high Φ, and the symbols are communicated perfectly across the channel; hence Shannon information is high. Even so, little meaning is communicated because the symbols sent from the source complex fail to trigger the intended Φ-fold in the target complex. This may happen, for example, because the source and target speak different languages. In this case, the symbols transmitted are perceived at a shallow level (say, as letters or phonemes), but they do not percolate deep into the target complex and thus trigger few distinctions and relations. Therefore, perceptual richness is low (Supplementary material II and [11]) and little meaning is communicated. More generally, even if source and target speak the same language, if their internal architecture is substantially different, communication of meaning will be reduced. There may also be outright miscommunication, in the sense that the message may trigger in the target a meaning radically different from the one intended by the source.

functionally equivalent to us but cannot be conscious because, like in the case of the cerebellum, their substrate cannot support Φ-structures of high Φ[19]. Crucially, IIT’s validity can be assessed empirically by determining whether the theory can account for the presence and quality of our own consciousness and how it relates to its neural substrate. 12 The Chinese room argument[44], which was originally aimed at showing that syntax (processing) is not enough for semantics (meaning), shares a similar insight. In the argument, one imagines being conscious and performing a set of functions correctly—answering questions in Chinese—but without any awareness of their meaning. In a world with no conscious beings, all kinds of functions could be carried out without anybody experiencing anything, and therefore without any intrinsic meaning. 13 Strictly speaking, small subsets of transistors in each computer may have trivially low but non-zero Φ[19].

Page 14

Fig. 6. Communicating meaning according to IIT. There is no communication of meaning if (a) the source or the target is not conscious, (b) the symbols are not transmitted successfully, or (c) the symbols do not trigger a similar sub-structure within the Φ-structure of the target. (d) Successful communication of meaning happens when all the above conditions are met, as reflected by the similarity of source and target Φ-folds.

Page 15

Finally, Fig. 6(d) illustrates the successful communication of meaning. The source complex in its current state supports a Φ-structure corresponding to the current experience, within which a Φ-fold corresponds to a particular content. The complex’s current state triggers an output over a motor interface, which sends a symbol over a communication channel. Through a sensory interface, the symbol triggers a state in the target complex that supports a large Φ-structure, and a Φ-fold within it bears structural similarity to the source’s Φ-fold. Appropriately, IIT’s notion of information as the communication of meaning fits the original meaning of “informare” as “give form” to the mind. Just as the Φ value of the Φ-fold can be used to measure the amount of meaning, the similarity of the two Φ-folds could be used to measure the amount of meaning communicated.14 However, unlike Shannon information, which can be communicated perfectly in most instances, the communication of meaning will be approximate. This is because integrated information depends on the internal organization of source and target complexes, which can hardly be identical. Human brains share an evolutionary history, but developmental events and learning trajectories will necessarily result in individual differences in the precise wiring of the neural substrate of consciousness. Even if the activity pattern were the same, then, the intrinsic meaning of the Φ-structure specified by each substrate would differ from person to person.15 Conclusion As outlined in this paper, both information theory[4] and integrated information theory (IIT)[5] take an axiomatic approach to characterize information and employ the tools of probability theory. However, they differ in critical ways: in essence, information theory deals with messages, IIT with meaning. At the most general level, information theory is about the reliable transmission of symbols. It takes the extrinsic perspective of a communication engineer; characterizes information through axioms based on that perspective; picks a channel with its source, target, and capacity; and devises codes to optimally transmit messages across it.
IIT addresses the feeling/meaning of an experience. It takes the intrinsic perspective of a conscious subject and defines integrated information following axioms that characterize the essential properties of consciousness. In physical terms, this implies that the substrate of consciousness must be causal, intrinsic, specific, unitary, definite, and structured. Based on these properties, the IIT formalism can be used to unfold the intrinsic meaning of an activity pattern over a complex of units, yielding a Φ-structure, and to quantify it as integrated information (Φ). In principle, the communication of meaning can be assessed by measuring similarities between source and target Φ-folds. Shannon information satisfies different properties, suited to quantifying the communication of symbols across a channel,

14 This is not easy, as it requires unfolding structures with a very large number of components and developing ways to compare them optimally. 15 A scenario not illustrated in the figure is a perfect one-to-one mapping in which every activity pattern (hence Φ-structure) in the source complex would be able to send a different symbol to the target complex, where it would trigger a different Φ-structure. However, the mapping would be between radically different Φ- structures—that is, between radically different meanings (say, whenever the source thinks and says “gnat,” the target hears and thinks “tang”). One could nevertheless find a transformation (“translation”) that maximizes the communication of meaning. This case also emphasizes that, while the mapping may reveal symmetries among the meanings specified by the source and target complexes such that one “semantic space” could be rotated (“translated”) into the other, in IIT’s analysis, the meaning of an experience is defined in absolute terms by the distinctions and relations that compose a specific Φ- structure, rather than by its similarity to other Φ- structures. By the same token, similarities among Φ-structures can be exploited to assess similarities among meanings in absolute terms.

Page 16

but its formalism has nothing to say about the symbols’ intrinsic meaning. Unlike information theory, IIT was systematized only recently. While the core of the theory has remained unchanged, its formalism is still undergoing refinements and expansions[5, 45-47]. Furthermore, the exhaustive unfolding of Φ-structures is unfeasible for large systems due to the combinatorial explosion of candidate systems, unit grains, distinctions, relations, and partitions. Nevertheless, the IIT framework has explanatory, predictive, and inferential power[12]. For example, it explains why certain regions of the brain can support consciousness while others cannot[12] and why consciousness is lost during dreamless sleep, anesthesia, and generalized seizures[17, 48]. Measures of complexity inspired by IIT can classify subjects as conscious vs. unconscious with unmatched sensitivity [18, 49]. IIT is also being used to account for the quality of experience—why space feels extended[15], time flowing[14], and objects as binding general concepts with particular features[13]—based on the organization of neural connections in different brain regions. Finally, IIT’s account of intrinsic meaning as a structure composed of integrated information has several implications. For example, while standard computers can be functionally equivalent to humans, their wiring is incompatible with consciousness[19]. Therefore, even though computers may engage in seemingly meaningful behaviors and utter seemingly meaningful sentences when driven by the proper software, their actions and utterances can only be imbued with intrinsic meaning when interpreted by a conscious being. Even among conscious entities, intrinsic meaning depends on the particular composition of Φ-structures, which in turn depends on the precise wiring of their substrate, which is bound to vary as a result of evolutionary history, developmental events, and learning. Therefore, even shared meanings will differ to some extent, and many meanings will be idiosyncratic. This poses a limit on the communication of meanings but also suggests how one might proceed to assess it and improve it.

Page 17

Supplementary material I (information theory) Source entropy and source coding If we know the probabilities of occurrence for all the symbols in the source alphabet, we can assign to each symbol a unique binary codeword. A message is then a sequence of binary codewords. Shannon showed that the expected length of the binary sequence generated by the source (the message) is minimized by assigning log(1/pn) binary digits to the nth symbol, where pn is its probability of occurrence[6]. This means that, given optimal source coding, the information content of a message is simply its length in bits. The minimum expected length per symbol (the rate at which information is produced) is called source entropy. As a measure of information content, Shannon entropy H = n pn log (1/pn) has the following desired properties (the first three can be used to derive it uniquely [4]):
Continuity: H is continuous in pn. Additivity: If symbol generation is broken down to successive selections of symbols, the entropy of the overall process is the weighted sum of the entropies of the selection subprocesses. For example, if the source alphabet contains sounds in the English language, we can generate symbols by first choosing whether the symbol is a vowel or not, and then sample from the vowel/consonant subsets. The entropy of the overall sound-generation process is the entropy of the vowel/consonant selection plus the entropy of sampling from consonant and vowel subsets, weighted by their corresponding probabilities of selection. Monotonic increase with number of outcomes: H increases monotonically with the number of symbols N, if the symbols are equiprobable, pn = 1/N. Maximum: If the number of outcomes is fixed, H is maximal when all the outcomes are equiprobable. Symmetry: H does not change if the outcomes are reordered. For example, if we generate the sequence using a new probability distribution, which uses a permutation of the original probabilities, the entropy value does not change. These are natural properties if we define the information content of a message as the number of binary digits that minimizes its length. Importantly, it can be shown that Shannon entropy H is a useful estimate of the number of bits required to represent the typical message from a source.16 Mutual information, channel capacity, and channel coding To assess how well information is transmitted across a channel, information theory resorts to mutual information, which characterizes the statistical dependence between probability distributions of outcomes (not individual outcomes) between source (X) and target (Y). Mutual information I(X;Y), defined as H(Y) – H(Y|X), can be rewritten as

16 Thanks to the asymptotic equipartition property [6, 8, 50], if we have a distribution over N symbols and draw M samples from it, the probability of any typical occurrence is 2−MH, where H is the Shannon entropy of the distribution and M is large enough. This is because, with probability of almost 1, the nth symbol is going to occur pnM times in any outcome, and the probability of occurrence of any such outcome is then Π𝑛=1 𝑁 𝑝𝑛 𝑝𝑛𝑀. If we want to represent these long equiprobable sequences with bits, we need to use at least MH bits; otherwise we will run out of codewords to assign to all the 2MH members. Thus, MH is the minimum number of bits that we need to represent the sequences, and H is the average number of bits per symbol.

Page 18

DKL(P(X,Y)∥P(X)P(Y))—that is, the Kullback–Leibler divergence between the joint distribution and the product of the marginals. The measure is zero only when X and Y are independent. It is usually described as the information that random variable Y conveys about random variable X on average and vice versa. Shannon defined the capacity of a channel as the maximum rate at which information can be transmitted through it, corresponding to the maximum of mutual information between source and target (the maximization is with respect to the source distribution). It can be shown that, for a long enough sequence of symbols, we can assign binary codewords to X such that, on average, I(X;Y) bits can be recovered uniquely by observing Y [4].17 Shannon further showed that, below channel capacity, it is possible to design an error control code (channel coding) whose probability of error is arbitrarily small. Information transmission and codes In information theory, the most general definition of a code is a mapping from a set of symbols (such as outcomes of X) to another set (such as binary digits). Shannon’s source coding allows one to recover (decode) the original symbols from their binary representations uniquely, while minimizing the expected length of the overall message (lossless compression). But in general, a code does not need to be binary, deterministic, or unique, or to minimize the expected length of messages. Thus, we can think of Y as a code for X, and of P(Y|X) as the coding function, as long as P(Y|X)  P(Y), i.e. I(X;Y) > 0 (that is, as long as we can answer some questions about X by observing Y). To illustrate, consider the simple channel in Fig. 7(a). Both X and Y ensembles contain four symbols, and each input symbol from X can lead to two output symbols in Y with equal probabilities. Both the input and output alphabets have four symbols, corresponding to 2 bits per symbol. However, the channel is noisy, its capacity is only 1 bit per symbol (regardless of the distribution over X, H(Y|X), and the mutual information I(X;Y) is at most 1 bit. Even so, Y can be considered as a code for X because observing Y allows us to answer at least some questions about X. Moreover, which questions we can answer depends on extrinsic factors, namely the coding scheme determined by the channel designer. For example, the designer could choose to transmit only r and g, with equal probabilities, corresponding to a source entropy of 1 bit. The source could be encoded by assigning a 1-bit name to r and g (codebook {r : 0, g : 1}), achieving error-free communication at channel capacity. Or the designer could choose to achieve channel capacity by transmitting all the symbols with equal probability 0.25, corresponding to a source entropy of 2 bits. Then, using codebook {r : 00, y : 01, g : 11, b : 10} (Gray code), the receiver acquires full knowledge on 1 out of the 2 transmitted bits. Thus, when the receiver observes the yellow-green flag, they will know the second bit in the transmitted message is 1, while the first bit remains fully unknown. In short, mutual information tells us how many bits can be transmitted, but it does not tell us which bits, nor can it tell us what they mean—that is left entirely to the interpretation (codebook) of extrinsic observers.

17 This is again based on the asymptotic equipartition property[6].

Page 19

Information processing and relevant codes Extrinsic observers can not only choose particular codebooks but also extrinsic variables guiding the encoding and transmission of Shannon information. In other words, they can design information-processing channels to answer questions relevant to them, while ignoring other data[51]. For example, consider the information channels with two successive stages shown in Fig. 7(b) and Fig. 7(c). Suppose that we are not interested in recovering X from Y, but only in a certain aspect of X, whose presence or absence we can encode with the random variable W. We can then define relevance as how well we can recover W given X vs. given Y. This is known as the information-bottleneck method and helps us to measure if Y is efficiently squeezing information about W from X[51]. This provides us with a framework to quantify not only how many questions we can answer about X by observing Y, but which questions, as long as we can resort to an extrinsic variable W.18 In Fig. 7(b), X is not the channel source but an intermediate encoding of W. The first stage of the channel, W → X, does not introduce any noise. Channel capacity is 1 bit, so we can perfectly recover symbols of W, and the optimal input distribution is a uniform

18 An interesting case arises when we define W as the future state of X. In this case, what is relevant is how well the channel is able to predict the future (predictive processing)[25-27].
Fig. 7. Codes and relevant codes. Examples showing how an external observer assigns meaning to codes through extrinsic partitioning of sample spaces. The arrows depict which symbols in the source lead to which symbols in the target with equal probabilities. (a) Due to the channel noise, the observer can at most recover 1 bit assigned to X by observing Y and can achieve this by designing a codebook for X and a decoding algorithm (see Supplementary material I). (b) The observer can introduce a new random variable W (say, representing the presence/absence of faces) that partitions the X space (space of visual stimuli) and define the relevant bit using W. While Y preserves 1 bit from X and X preserves 1 bit from W, Y only preserves 0.5 bit of the relevant information W. Y is still a code for W, a partitioning defined by the external observer. (c) The third channel preserves the relevant correlations, without changing the capacity, and makes Y an efficient code for W. Now, the observer is able to design a codebook for W and a decoding algorithm to recover the symbols assigned to W by observing Y.

Page 20

distribution over W. The second stage, X → Y, also has a channel capacity of 1 bit. However, it does not preserve the relevant bit about W: H(W) = 1, I(W;X) = 1, but I(W;Y) = 0.5, meaning that we cannot recover W exactly from Y in one transmission. However, a trivial rearrangement of the channel can make Y a more efficient code for W without changing the channel capacity of X → Y, as shown in Fig. 7(c). In this new scenario, H(W) = 1, I(W;X) = 1, and I(W;Y) = 1, leading to perfect recovery of W (but not X) from Y. Thus, the question of whether Y is a good code for W boils down to which statistical dependencies are being preserved by Y. Information theory provides us with useful tools to examine such statistical dependencies. Even so, information processing can only decrease the information transmitted about the inputs to a channel and thus the number of questions we can answer about it[6, 8]. By the data-processing inequality, if we represent a channel as the Markov chain W → X → Y, then H(W) ≥ I(W;X) ≥ I(W;Y). Most relevantly, information processing does not deal with how the meaning of the symbols would be assigned and interpreted.

Page 21

Supplementary material II (IIT) Experience, its essential properties, and their operational formulation IIT starts from experience itself, rather than from its behavioral, functional, or neural correlates[5, 10, 12]. The existence of experience, which is immediate and irrefutable, is the 0th axiom of IIT.19 From within consciousness, we can plausibly assume the existence of a world independent of our own experience. We can call this world physical if we can reliably observe it and manipulate it. IIT defines physical existence operationally as cause– effect power—the ability to “take and make a difference”—which is the 0th postulate of IIT. Searching for the substrate of consciousness in physical terms then means formulating phenomenal existence (the 0th axiom) operationally, in terms of cause–effect power (the 0th postulate).
0) Existence: Experience exists. Therefore, the substrate of consciousness must be constituted of units that can take and make a difference. IIT then identifies the essential properties of consciousness—immediate and true of every conceivable experience—as the five axioms of phenomenal existence. These properties are formulated in terms of cause–effect power as the five postulates of physical existence[5], which are employed to identify substrates of consciousness and unfold their structures. The five axioms and corresponding postulates are as follows:

  1. Intrinsicality. Experience is intrinsic: it exists for itself. Thus, the cause–effect power of its substrate must be intrinsic: it must take and make a difference within itself.
  2. Information. Experience is specific: it is this one. Thus, the cause–effect power of its substrate must be specific: it must be in this state and select this cause–effect state.
  3. Integration. Experience is unitary: it is a whole, irreducible to separate experiences. Thus, the cause–effect power of its substrate must be unitary: it must specify its cause– effect state as a whole set of units, irreducible to separate subsets.
  4. Exclusion. Experience is definite: it is this whole—all of it. Thus, the cause–effect power of its substrate must be definite: it must specify its cause–effect state as this whole set of units.
  5. Composition. Experience is structured: it is composed of distinctions and the relations that bind them together, yielding a phenomenal structure that feels the way it feels. Thus, the cause–effect power of its substrate must be structured: subsets of units must specify cause–effects over subsets of units (distinctions) that can overlap with one another (relations), yielding a cause–effect structure (or Φ-structure) that is the way it is. On this basis, IIT claims that every property of an experience must be accounted for by the Φ-structure specified by its substrate, with no additional ingredients. This is the fundamental explanatory identity of IIT[5].20 Complexes

19 In fact, experience is the starting point for everything, including science, logic, and mathematics[52]. 20 In addition, IIT employs ontological principles, key among them being the principles of maximal and minimal existence[5]. Maximal existence states that, when it comes to a requirement for existence, what exists is what exists the most. Minimal existence states that something cannot exist more than the least it exists.

Page 22

Based on the postulates and the transition probability matrix (TPM) of a system of units, IIT proceeds to identify sets of units that qualify as substrates of consciousness (complexes)[5, 53]. A key role is played by the intrinsic difference[54] between probability distributions, a measure that uniquely satisfies the 0th and first two postulates—being causal, intrinsic, and specific (much like Shannon’s entropy function). A complex is a set of units that has cause–effect power (existence) within itself (intrinsicality), is in a specific state and selects a specific cause and effect state over itself (information), and does so in a way that is not only irreducible (integration) but maximally so (exclusion). Irreducibility is measured by integrated information φs (“system phi” or “small phi”), which for a complex must be higher than that of any set of units overlapping with it (by the principle of maximal existence). Any substrate of units condenses into a disjoint (non-overlapping) and exhaustive set of complexes, some of which may be exceedingly large and others negligibly small (Fig. 3). The “intrinsic units” of each complex also have a definite grain (typically, they will be macro-units constituted of many micro-units[55]), which is the one maximizing the existence of the complex. From the intrinsic perspective of a complex, the rest of the universe can be considered as a set of background conditions that can mediate extrinsic interactions. Φ-structures and differentiation capacity Once a complex has been identified, IIT proceeds to unfold its Φ-structure (Fig. 3). This is composed of all the causal distinctions and relations specified by subsets of units within the complex. A subset of units specifies a distinction if it selects a specific, maximally irreducible cause and effect within the complex that is congruent (in the same state) with the cause and effect state of the complex as a whole. Relations obtain whenever there are overlaps among cause and/or effects of distinctions (details in [5]) Each distinction and relation has an associated value of integrated information (φ), whose sum is the structure integrated information (Φ, “structure Phi” or “big Phi”). The operation of determining all the distinctions and relations that compose the Φ-structure specified by a complex in a state is called unfolding. The number of units in a complex and the organization of their intrinsic interactions impose an upper bound on the number and strength of the causal distinctions and relations the complex can specify in a given state[56]. This translates to an upper bound on the sum of the φ values of all unique distinctions and relations a complex can specify across all its possible states, called differentiation capacity[11]. This bound is somewhat analogous to a channel’s capacity as an upper bound on information transmission in the case of Shannon information.
Contents of experience (Φ-folds) For IIT, the Φ-structure accounts for both the quantity and quality of experience, with all its contents. The quantity is given by the Φ-structure’s Φ value. The quality or way an experience feels—its “feeling”—is fully characterized by how the distinctions that compose the Φ-structure are related among themselves. The feeling of an experience is also its intrinsic meaning (the feeling is the meaning). For example, consider the feeling of spatial extendedness, associated with most visual (and bodily) experiences, which is what “space” means from the intrinsic perspective of a subject. Phenomenal space can be characterized as a structure composed of “spots” that are related by reflexivity, inclusion, fusion, and connection[15]. Phenomenal structures of this

Page 23

kind can be accounted for by Φ-structures specified by lattices of units, such as those found in posterior cortical areas of the brain, in line with neurological evidence[15]. More generally, any specific content within an experience, say, seeing the house on fire, would correspond to a part of the Φ-structure—a Φ-fold or sub-structure—composed of a highly interrelated subset of its distinctions and relations. This Φ-fold corresponds to the feeling/meaning of seeing the house on fire from the intrinsic perspective of the subject of the experience (Fig. 5, right panel). For IIT, then, information is not a symbol or code but a structure, which corresponds to its intrinsic meaning. And information content is quantified not by the length of a message in bits but by the sum of the Φ values of the distinctions and relations composing the Φ-fold. Meaning and matching Because the feeling/meaning of an experience is defined intrinsically, it does not matter how the experience is triggered. For example, the feeling/meaning of the house on fire is similar whether it is triggered by an external stimulus (Alice), imagined (Bob), or dreamt. However, one can assess the extent to which contents of experience are caused by external stimuli by multiplying the corresponding Φ-folds by a triggering coefficient, yielding a measure of perceptual richness[11]. Accordingly, perceptions of external inputs can be considered as interpretations whose intrinsic meaning is provided by the Φ-structure. In an adapted brain, intrinsic meanings will largely refer to (represent) relevant causal processes in the environment. For example, during wakefulness, correlated inputs due to exogenous causal processes triggers activity patterns that “up-select” (net strengthening) intrinsic connections, transforming extrinsic correlations into intrinsic causal powers. During sleep, intrinsic connections are “down-selected” (net weakening) based on activity pattern triggered endogenously. Over multiple sleep-wake cycles, the intrinsic connectivity is adjusted to preserve intrinsic meanings that “match” causal features of the environment at the expense of those that do not[11, 57, 58]. The overall matching between intrinsic meanings and causal processes in the environment can be estimated by measuring the differentiation of neural activity[11].

Page 24

References 1. Floridi, L., The Philosophy of Information. 2011. 2. Horgan, T., Original Intentionality is Phenomenal Intentionality. The Monist, 2013. 96(2): p. 232-251. 3. Putnam, H., The meaning of ‘meaning’. Minnesota Studies in the Philosophy of Science, 1975. 7: p. 131-193. 4. Shannon, C.E., A Mathematical Theory of Communication. Bell System Technical Journal, 1948. 27(3): p. 379-423. 5. Albantakis, L., et al., Integrated information theory (IIT) 4.0: Formulating the properties of phenomenal existence in physical terms. PLoS Comput Biol, 2023. 19(10): p. e1011465. 6. MacKay, D.J.C., Information Theory, Inference, and Learning Algorithms, D.J.C. MacKay, Editor. 2003, Cambridge University Press. p. 334೦C340. 7. Rieke, F., et al., Spikes: exploring the neural code. 1999. 8. Cover, T.M., Elements of information theory. 1999: John Wiley & Sons. 9. Nagel, T., What is it like to be a bat? The Philosophical Review, 1974. 83(4): p. 435-450. 10. Ellia, F., et al., Consciousness and the fallacy of misplaced objectivity. Neuroscience of Consciousness, 2021. 2021(2). 11. Mayner, W.G.P., B.E. Juel, and G. Tononi, Intrinsic meaning, perception, and matching. arXiv, 2024. 12. Tononi, G., et al., Integrated information theory: from consciousness to its physical substrate. Nat Rev Neurosci, 2016. 17(7): p. 450-61. 13. Grasso, M.a.T., G, How do phenomenal objects bind general concepts with particular features. forthcoming. 14. Comolatti, R., M. Grasso, and G. Tononi, Why does time feel the way it does? arXiv, 2024. 15. Haun, A. and G. Tononi, Why Does Space Feel the Way it Does? Towards a Principled Account of Spatial Experience. Entropy, 2019. 21(12): p. 1160. 16. Melloni, L., et al., An adversarial collaboration protocol for testing contrasting predictions of global neuronal workspace and integrated information theory. PLOS ONE, 2023. 18(2): p. e0268577. 17. Massimini, M., et al., Breakdown of cortical effective connectivity during sleep. Science, 2005. 309(5744): p. 2228-32. 18. Casarotto, S., et al., Stratification of unresponsive patients by an independently validated index of brain complexity. Ann Neurol, 2016. 80(5): p. 718-729. 19. Findlay, G., et al., Dissociating Intelligence from Consciousness in Artificial Systems – Implications of Integrated Information Theory. Towards Conscious AI Systems Symposium - Proceedings of the AAAI 2019, 2019. 20. Findlay, G., et al., Dissociating Artificial Intelligence from Artificial Consciousness. arXiv preprint arXiv:2412.04571, 2024.

Page 25

Integrated Information Theory Wiki. 2024; Available from: https://www.iit.wiki. 22. Li, M. and P.M.B. Vitanyi, An introduction to Kolmogorov complexity and its applications. 1997. 23. Williams, P.L. and R.D. Beer, Nonnegative Decomposition of Multivariate Information. CoRR, 2010. abs/1004.2515. 24. Steveninck, v., et al., Reading a Neural Code. Advances in Neural Information Processing Systems, 1989. 2. 25. Clark, A., Whatever next? Predictive brains, situated agents, and the future of cognitive science. The Behavioral and brain sciences, 2013. 36(3): p. 181-204. 26. Friston, K., The free-energy principle: a unified brain theory? Nat Rev Neurosci, 2010. 11(2): p. 127-38. 27. Rao, R.P. and D.H. Ballard, Predictive coding in the visual cortex: a functional interpretation of some extra-classical receptive-field effects. Nature Neuroscience, 1999. 2(1): p. 79-87. 28. Dehaene, S. and J.-P. Changeux, Experimental and theoretical approaches to conscious processing. Neuron, 2011. 70(2): p. 200-227. 29. Blum, L. and M. Blum, A theory of consciousness from a theoretical computer science perspective: Insights from the Conscious Turing Machine. Proc Natl Acad Sci U S A, 2022. 119(21): p. e2115934119. 30. Tononi, G., O. Sporns, and G.M. Edelman, Measures of degeneracy and redundancy in biological networks. Proc Natl Acad Sci U S A, 1999. 96: p. 3257- 3262. 31. Tseng, S.Y., et al., Shared and specialized coding across posterior cortical areas for dynamic navigation decisions. Neuron, 2022. 110(15): p. 2484-2502.e16. 32. Buzsáki, G., Neural syntax: cell assemblies, synapsembles, and readers. Neuron, 2010. 68(3): p. 362-385. 33. Brette, R., Is coding a relevant metaphor for the brain? bioRxiv, 2018: p. 168237. 34. Tononi, G., O. Sporns, and G.M. Edelman, A complexity measure for selective matching of signals by the brain. Proc Natl Acad Sci U S A, 1996. 93(8): p. 3422- 7. 35. Putnam, H., Representation and reality. 1988, Cambridge, Mass.: MIT Press. xv, 136 p. 36. Blackmon, J., Searle’s Wall. Erkenntnis, 2013. 78(1): p. 109-117. 37. Edelman, S., Representation is representation of similarities. The Behavioral and brain sciences, 1998. 21(4): p. 449-498. 38. O’Brien, G. and J. Opie, Chapter 1 - Notes Toward a Structuralist Theory of Mental Representation, in Representation in Mind, H. Clapin, P. Staines, and P. Slezak, Editors. 2004, Elsevier: Oxford. p. 1-20. 39. Clark, A., Sensory Qualities. Sensory Qualities, 1996. 40. Gardenfors, P., Semantics Based on Conceptual Spaces. 2011. 41. Brier, S., Cybersemiotics : why information is not enough! Toronto studies in

Page 26

semiotics and communication. 2008, Toronto ; Buffalo: University of Toronto Press. xx, 477 p. 42. Nakai, T. and S. Nishimoto, Representations and decodability of diverse cognitive functions are preserved across the human cortex, cerebellum, and subcortex. Commun Biol, 2022. 5(1): p. 1245. 43. Lemon, R.N. and S.A. Edgley, Life without a cerebellum. Brain, 2010. 133(Pt 3): p. 652-654. 44. Searle, J.R., Minds, brains, and programs. Behavioral and Brain Sciences, 1980. 3(03): p. 417-424. 45. Tononi, G., An information integration theory of consciousness. BMC Neurosci, 2004. 5: p. 42. 46. Tononi, G., Consciousness as integrated information: a provisional manifesto. Biol Bull, 2008. 215: p. 216-242. 47. Oizumi, M., L. Albantakis, and G. Tononi, From the phenomenology to the mechanisms of consciousness: integrated information theory 3.0. PLoS Comput Biol, 2014. 10(5): p. e1003588. 48. Pigorini, A., et al., Bistability breaks-off deterministic responses to intracortical stimulation during non-REM sleep. Neuroimage, 2015. 112: p. 105-13. 49. Sarasso, S., et al., Consciousness and complexity: a consilience of evidence. Neurosci Conscious, 2021. 2021(2): p. niab023. 50. Verdú, S. and T.S. Han, The role of the asymptotic equipartition property in noiseless source coding. IEEE Transactions on Information Theory, 1997. 43(3): p. 847-857. 51. Tishby, N., F.C. Pereira, and W. Bialek, The information bottleneck method. arXiv preprint physics/0004057, 2000. 52. Tononi, G., On Being: Ontological and metaphysical implictions of Integrated Information Theory (IIT). forthcoming. 53. Marshall, W., et al., System Integrated Information. Entropy (Basel), 2023. 25(2): p. 334. 54. Barbosa, L.S., et al., A measure for intrinsic information. Scientific Reports, 2020. 10(1): p. 18803. 55. Marshall, W., et al., Intrinsic Units: Identifying a system’s causal grain. bioRxiv, 2024: p. 2024.04.12.589163. 56. Zaeemzadeh, A. and G. Tononi, Upper bounds for integrated information. PLoS Comput Biol, 2024. 20(8): p. e1012323. 57. Tononi, G. and C. Cirelli, Sleep and the price of plasticity: from synaptic and cellular homeostasis to memory consolidation and integration. Neuron, 2014. 81(1): p. 12-34. 58. Tononi, G., M. Boly, and C. Cirelli, Consciousness and sleep. Neuron, 2024.

Canonical Hub: CANONICAL_INDEX

Ring 2 — Canonical Grounding

Ring 3 — Framework Connections